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, Abstract — Transfer entropy is a measure of the magnitude and 
the direction of information flow between jointly distributed 
stochastic processes. In recent years, its permutation versions 
are considered in the literature to estimate the transfer entropy 
by counting the number of occurrence of orderings between 
values, not the values themselves. Here, we introduce the transfer 
entropy rate and its permutation version, the symbolic transfer 
entropy rate, and show that they are equal to each other for any 
bivariate flnite-alphabet stationary ergodic Markov process. Our 
proof is based on the duality between values and orderings, which 
is introduced in [T. Haruna and K. Nakajima, Physica D 240, 
1370 (2011)] and may give a coherent basis for the relationship 
between information theoretical quantities and their permutation 
versions defined on flnite-alphabet stationary stochastic processes. 
We also discuss the relationship among the transfer entropy rate, 
the time-delayed mutual information rate and their permutation 

[ versions. 

' Index Terms — Permutation entropy, transfer entropy, symbolic 
transfer entropy, transfer entropy on rank vectors, Markov 
process 



I. Introduction 

QUANTIFYING networks of information flows is critical 
to understand working of complex systems such as liv- 
mg, social and technological systems. Schreiber [ 1 1 introduced 
the notion of transfer entropy to measure the magnitude and 
the direction of information flow from one element to another 
element emitting stationary signals in a given system. It has 
been used to analyze information flows in real time series data 
from neuroscience IT], d, S, El, 0, M, Q, 10, M, Gl, 
ifTTl . and many other fields 112, |[ni, Ql, lE), IM, lEl, 

ma, C19J. 

The notion of permutation entropy introduced by Bandt 
and Pompe f20l has been proved that much of information 
contained in stationary time series can be captured by counting 
occurrences of orderings between values, not those of values 
fliemselves 121], (22, ||23], (21, 123. In particular, it is 
known that the entropy rate l26ll . which is one of the most 
fundamental quantity of stationary stochastic processes, is 
equal to the permutation entropy rate for any finite-alphabet 
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Stationary stochastic process ||27]| . The method of permutation 
has been applied across many disciplines ||28ll . 

The symbolic transfer entropy |29| is a permutation version 
of the transfer entropy and has been used as an efficient and 
conceptually simple way of quantifying information flows in 
real time series data (291, (30l . l3T| . l32]| . Another permutation 
version of the transfer entropy called transfer entropy on 
rank vectors has been introduced to improve the performance 
of the symbolic transfer entropy |33|. So far, most of the 
work on permutation versions of the transfer entropy are in 
application side. Although some theoretical results are known 
for the original transfer entropy [34], [35 1, so far no theoretical 
consideration has been given for its permutation versions to 
the best of the authors' knowledge. The aim of this paper is 
to shed light on the relationship between the transfer entropy 
and their permutation versions from a theoretical perspective. 
In particular, we will consider the notions of transfer entropy 
rate [ 36 1 and symbolic transfer entropy rate and show that they 
are equal for any bivariate finite-alphabet stationary ergodic 
Markov process. 

Our approach is based on the duality between values and 
orderings introduced by the authors (37]. In (37ll . the excess 
entropy f39l, [^, (H, (H, fA^, fA^, which 

is an effective measure of complexity of stationary stochastic 
processes, and its permutation version is shown to be equal 
for any finite-alphabet stationary ergodic Markov process. 
Our result in this paper demonstrates the broad applicability 
of the duality between values and orderings for discussing 
the relationship between information theoretic measures and 
their permutation versions defined on finite-alphabet stationary 
stochastic processes. 

This paper is organized as follows. In Section |lll we 
introduce the transfer entropy rate and the symbolic transfer 
entropy rate. We also discuss some combinatorial facts used in 
later sections. In Section |III1 we give a proof of the equality 
between the transfer entropy rate and the symbolic transfer 
entropy rate which holds for bivariate finite-alphabet stationary 
ergodic Markov processes. In Section IIVI we discuss the 
relationship among the transfer entropy rate, the time-delayed 
mutual information rate and their permutation versions. Fi- 
nally, in Section |Vl we give concluding remarks. 

II. Definitions and Preliminaries 

Let An — {1,2,-- - ,ri} be a finite alphabet consisting of 
natural numbers from 1 to n. In the following discussion, X = 
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{Xi,X2, • • • } and Y = {Yi, F2, • • • } are jointly distributed 
finite-alphabet stationary stochastic processes, or equivalently, 
(X, Y) is a bivariate finite-alphabet stationary stochastic 
process {{Xi, Yi), {X2, 12), • • • }, where stochastic variables 
Xi and Yj take their values in the alphabet An and Am, 
respectively. We use the notation = {Xi, X2, ■ • ■ ,Xl) 
for simplicity. We write p{xi,yi^) for the probability of the 
occurrence of words S A^ and yf^ G A^. 

Originally, the notion of transfer entropy was introduced as 
a generalization of the entropy rate to bivariate processes |[T|. 
Along this original motivation, here, we do not consider the 
transfer entropy but the transfer entropy rate It36il from Y to 
X which is defined by 

t{X\Y) = h{X)-h{X\Y), (1) 

where /i(X) = Iiuil^oo H{Xi)/ L is the entropy rate of 
X, H{X[) = -ExfeAiP(a;f)log2P(a;f') is the Shannon 
entropy of the occurrence of words of length L in X and 
/i(X|Y) is the conditional entropy rate of X given Y defined 
by 

/i(X|Y) = lim (2) 

which always converges. t(X|Y) has the properties that (i) 

< t(K\Y) < h(X.) and (ii) i(X|Y) = if X^ is 
independent of Y^ for all i > 1. 

In order to introduce the notion of symbolic transfer entropy 
rate, we define a total order on the alphabet A„ by the usual 
"less-than-or-equal-to" relationship. Let Sl i>e the set of all 
permutations of length L > 1. We consider each permutation 
TT of length i as a bijection on the set {1, 2, • • • , L}. Thus, 
each permutation tt G Sl can be identified with a sequence 
7r(l) • • • tt{L). The permutation type tt e 5l of a given word 
Xi G is defined by re-ordering a;i, • • • ,xl in ascending 
order, namely, x^ is of type tt if we have x^(^i) < a;^(i+i) and 
TT{i) < n{i + 1) when Xjr{i) — 2;7r(i+i) for i = 1, 2, • • • , L — 
1. For example, 7r(l)7r(2)7r(3)7r(4)7r(5) = 41352 for xf = 
24213 e because XiXiX3X5X2 = 12234. The map </>„ : 
Sl sends each word xf to its unique permutation type 

We will use the notions of rank sequences and rank 
variables II27I in some situations. The rank sequences of 
length L are words rf G Aj^ satisfying 1 < Vi < i for 

1 = 1, • • ■ 7^- The set of all rank sequences of length L is 
denoted by TZ^. It is clear that \TZl\ = L\ = \Sl\- Each word 
Xi G A^ can be mapped to a rank sequence rf by defining 
ri = X]}=i ^i^j — ^i) foi" i — ' ' where 5{P) = 1 if 
the proposition P is true, otherwise 5{P) — 0. We denote this 
map from A^ to TZl by (^„. It can be shown that the map 
ifn '■ An — ?> TZl is compatible with the map 0„ : A^ — Sl 
in the following sense: there exists a bijection l : TZl ^ Sl 
such that Loifn — (f'n The rank variables associated with 
X are defined by Ri = ^i-^j — foi" i = ■ ■ ■ , L. 
In general, R = {i?i, i?2, • • • } is a non- stationary stochastic 
process. 

The symbolic transfer entropy rate from Y to X is defined 

by 

r(X|Y) = /i*(X) - /i*(X|Y), (3) 



where /i*(X) = liniL^oc H*iXi)/L is the permutation 
entropy rate which is known to exists and is equal to h(X.) 
|27|, H*{Xt) = -E.GS^PWlog2PW is the Shannon 
entropy of the occurrence of permutations of length L in X, 

pW = E0„(^f )=^P(a;f ) and h*{X\Y) is given by 

h*{X\Y)= lim (H*{X^+\Y,^) - H*{Xf',Y,^)) (4) 

if the limit in the right hand side exists. Here, 
is defined by = 

~E^G5i,^'G5M^'('^'^')log2P(7i-,7r') where p(7r,7r') 

variables associated with X and Y, respectively. By the 
compatibility between (pk and (pk for k — m, n, we have 
H{R{,Sf^) = H*(X^,Y^'^). Thus, /i*(X|Y) can be written 
as h*{X\Y) = \\mL^ooH{RL+i\R'(,S^) if /i*(X|Y) 
exists. 

Note that the above definition of the symbolic transfer 
entropy rate ^ is the rate version of the transfer entropy on 
rank vectors li33J which is an improved version of the symbolic 
transfer entropy ||29l . Of course, we can directly consider the 
rate version of the symbolic transfer entropy. However, its 
definition is more complicated than the rate version of the 
transfer entropy on rank vectors. Moreover, we can show that 
the rate version of the symbolic transfer entropy is equal to 
the rate version of the transfer entropy on rank vectors for 
bivariate finite-alphabet ergodic Markov processes, the class 
of stochastic processes considered in this papeiQ. Hence, we 
here take the formula (O as the definition of the symbolic 
transfer entropy rate. 

III. Main Result 

In this section, we give a proof of the following claim: 
for any bivariate finite-alphabet stationary ergodic Markov 
process (X, Y), 

t(XIY) =r(X|Y). (5) 

In order to prove this claim, we introduce a map fi : Sl ^ 
N^, where N = {1,2,---} is the set of all natural numbers 
ordered by usual "less-than-or-equal-to" relationship, by the 
following procedure: First, given a permutation tt e Sl, we 
decompose the sequence 7r(l) • ■ • tt{L) into maximal ascend- 
ing subsequences. A subsequence ij ■ ■ ■ ij+k of a sequence 
zi • • - zl is called a maximal ascending subsequence if it is 
ascending, namely, ij < ij+i < ■ ■ ■ < ij+k, and neither 
ij-iij ■ ■ - ij+k nor ijij+i ■ ■ -ij+k+i is ascending. Second, if 
7r(l) • • •7r(ii), 7r(ii + l) ■ ■■71(12), ■ ■ ■ ,7r(ifc-i + l) • ■■tt{L) is 
a decomposition of 7r(l) • ■ • tt{L) into maximal ascending sub- 
sequences, then we define a word xf G by x^(i-f — ■ ■ ■ — 

*^7r(zi) i- ; *^7r(zi +1) ' ' ■ '^7r(i2) '^i ' ' ' ; *^7r(zfc_i+l) 

■ ■ ■ — x^(^L) — k. We define ^{tt) — xf . For example, a 
decomposition of 25341 G ^5 into maximal ascending subse- 
quences is 25, 34, 1. We obtain fi{Tr) — X1X2X3X4X5 = 31221 

'The proof for this resuh will be presented elsewhere because it requires 
generalization to hidden Markov processes which is beyond the scope of this 
paper. 
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by putting X2X5X3X4X1 = 11223. By construction, we have 
4>n o mC"") — when /i(7r) e for any tt G 5^. 

The map fi can be seen as a dual to the map (/>„ (or ipn) in 
the following sense (Theorem 9 in |37|): Let us put 

Bnx = {xf e v4,^|(/)^i(7r) = for some TT e 

77ie«, 0„ restricted on Bn,L is a map into Cn,L, A* restricted 
on Cn,L is a map into B„^l, and they form a pair of mutually 
inverse maps. Furthermore, we have the following structural 
characterization of words in the set Bn,L defined by 

Since h{X) = h*{'X.) holds for any finite-alphabet sta- 
tionary process, proving (|5]l is equivalent to showing that the 
equahty 

lim H{RL+i\RtS^)^ lim , Yi^) (7) 

holds for any bivariate finite-alphabet stationary ergodic 
Markov process (X,Y). Without loss of generality, we can 
assume that each (x, y) e An x Am appears with a positive 
probability p{x, y) > 0. By the assumed ergodicity, for any 
e > if we take L sufficiently large, then 

Pix^,yi)>l~e, (8) 

J- satisfies (*), 
satisfies (**) 

where (*) is the condition that for any x G A„ there exist 
1 < J < [L/2J < j < L such that x ~ Xi = xj, {**) is the 
condition that for any y e Am there exist 1 < i' < [i/2j < 
j' < L such that y = yi> — yj> and [rj is the largest integer 
not greater than r for a real number r. 

We put Dr,,m,L = {(a;f,yf)|(*) and (**) hold} and 
En,m.L = {(rf,sf)|(^„(a;f) = rf, ifimivt) = 
sf for some G Dn.m.h}- By (|6l), we have x^ G 

Bn^L and y{ G Bm,L for {x{,y\;) G Dn,m,L- Thus, the 
map (a:f,yf) (-^ {ipn{x^),(pm{yi:)) is a bijection from 
Dn.m,L to En,m,L duc to the duality between (pk and for 
k — m,n. In particular, we have p(a;f ) = p(rf' ,sf) and 
p(rL+i|rf ,sf) =p(rL+i|xf ,yf) for {xi',y^) G fn.m^L and 

Given any e > 0, let us take L large enough so that 
the inequality dS) holds. We shall evaluate each term in the 
right hand side of (|9]l. The second term in ^ is bounded 
by e log2 n which can be arbitrary small. In order to show 
the third term also converges to as L — > 00, we use the 
Markov property: If (X, Y) is ergodic Markov, then by the 
similar discussion as in Section 4 of [371 we can show that 

CA^L" for some C>0, 0<A<1 and positive integer 
a. Thus, the absolute value of the third term is bounded 
by the quantity CX^L"'log2{L + 1) which goes to as 
i — > CX3. Finally, the first term is shown to be by the 
same discussion as in the proof of Lemma 1 of fTl] : If 
{xi,yi) G Dn,m,L, then each symbol x G An appears at 
least once in the word x^ (indeed, it appears at least twice). 
If ax is the number of 1 < x < n occurring in the word , 
then ttx > Q for all 1 < x < n. Hence, given (xf,yf) G 
Dn,m.L, XL+1 = X if and only if r^+i = Y,l,^j^ax'- 



Thus, given (xf,2/f) G Dn,m.L, the probability distribution 
p{^L+i\xi ,yi) is just a re-indexing of p(.Xi+i |xf , j/f), which 
implies that the first term is exactly equal to 0. This completes 
the proof of our claim. From the proof, we can also see that 
t*(X|Y) < t(X|Y) holds for any bivariate finite-alphabet 
stationary ergodic process (X, Y) if h*{X.\Y) exists for the 
process. 

IV. On the relationship with the time-delayed 

MUTUAL INFORMATION RATE 

Apart from permutation, it is natural to ask whether the 
equality for the conditional entropy rate 

lim HiXL+i\Xt,Y,^) ^ lim ^H{X^+^\Y,'') (10) 

holds or not, which is parallel to the equality for the entropy 
rate Iutil^^ H{Xl+i\X{^) ^ Huil^^ j^H{X^+^) which 
holds for any finite-alphabet stationary stochastic process X 
||261 . In this section, we will see that this question has an 
intimate relationship with the relationship between the transfer 
entropy rate and the time-delayed mutual information rate. 

In general, ( fTOb does not hold. For example, if X = Y, 
then we have limL^oo -ff (-'^L+i l-'^f , ^i^) = h{X.), while 
limi^oo j:H{X^^^ I Yj^) = 0. However, note that the inequal- 
ity 

lim H{Xl+i\X^,Y^'^)> lim yHiX^+^lY^"^) (11) 

holds for any bivariate finite-alphabet stationary stochastic 
process (X, Y). Indeed, we have 

lim H{XL+i\Xt,Y,^) 
= lim jJ2HiX.\X{-\Yr') 

i=l 
, L+1 

> lim -Y,H{X,\Xl-\Y,^) 

L— >oo J_j — ^ 

= lim jH{X{^+'\Y,^), 

where the first equality is due to the Cesaro mean theorem (if 
limi^oo bL ^ b then limi^oo 7; Yli=i bi = b) and the last 
equality follows from the chain rule for the Shannon entropy. 
In the following, we give a sufficient condition for (fTOl l. 

If there exists N > such that if i > N then Xi 
is independent of Y/^"' given X^^^ ^Y^^^ for any j > 0, 
that is, Pr(X, = x„y/+^ = y-+'l^r' = xV\Yi-^ = 
y\-') = Pr(X,; = x.lXj-i = x^\Yr' = y\-')Vr{Y:+' = 
y\^'\X\-^ = = ^^-^) for any j > 0, x, G 

An {I < k < i) and yi e Am {I < I < i + j), then diOl ) 
holds. Indeed, let a^^^ = H{Xi+i\Xl,Y-^). For a given i, 
Ui L is a decreasing sequence of L. By the chain rule for 
the Shannon entropy, H{X('+^\Yl^) = H{Xl+i\X\: .Y^^) + 
H{Xl\X^-\yI^) + --- + H{Xi\Y^^) = aL,L + aL-i,L + 
••• + ao,L — '}2d=o^i,L- However, by the assumption, we 
have ai^i — ai^i+i — — ■ ■ ■ for i > N. Hence, we 

have H{Xt+'\Y,^) - E.=o + E.tw+i Since the 
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for any 1 < i < n — 1 there exist I < j < k < L such that aij = z + 1, Xk — «}• 



(6) 



Pixi^yi) \ Y PixL+i\xi,yi)\og2PixL+i\xi,yi) - Yp{rL+i\x^,yi)log2PirL+i\xi,yi) 



(a;f-,yf')e-D„.„,t 



E 



(9) 



TL + l 



former sum is finite, by the Cesaro mean theorem, we ob- 
tain \imL^^j^H{X^+^\Y,^) = limi^oo i E^tiv+i - 
limL^oo aL,L = limi^oo H{Xl+i\X[ ,Y^^). Note that if the 
assumption holds, then it holds for = 1 by stationarity. 
If (X, Y) is Markov, then we can show that the assumption 
is equivalent to the following simpler condition by using the 
Markov property: 

P{x2,y2\xi,yi) ^ p{x2\xi,yi)p{y2\xi,yi) (12) 

for any xi,X2 e An and j/1,2/2 e ^m- 
If ( [Tol l holds, then we obtain 

t{X\Y) = lim i/(Xf+i;Yi^), (13) 

where /(A; B) is the mutual information between stochastic 
variables A and B. We call the quantity at the right hand side 
of (113b time-delayed mutual information rate and denote it by 
i+i(X; Y). Note that we have t(X|Y) < i+i(X; Y) for any 
bivariate finite-alphabet stationary stochastic process (X, Y) 
by the inequality (fTTT i. 

For any bivariate finite-alphabet stationary stochastic pro- 
cess (X, Y), one can see that j+i(X; Y) = /i(X) + h{Y) - 
h{X,Y) holds. Hence, we have i+i(X;Y) = i+i(Y;X) = 
z(X; Y) = -i(Y;X). Here, z(X; Y) = lim^^oo iliXt^Y^"^) 
is the mutual information rate between X and Y. Thus, 
when we consider the rate for mutual information between 
two jointly distributed finite-alphabet stationary stochastic 
processes X and Y, which is defined in the limit L ^ 00, 
time delay has no significance in contrast to the time-delayed 
mutual information which has been used in time series embed- 
ding ||46| and detection of nonlinear interdependence between 
two time series at different time points ll47l . 1481 . 

The result on the relationship between t(X|Y) and 
i-)-i(X;Y) when (X,Y) is Markov can be summarized as 
follows: We have ( I7il ) ;/ the condition ( 1721) holds for any 
Xi,X2 S An and 2/1,2/2 & Am, when (X, Y) is Markov. 

Another interesting case is when both (X, Y) and Y are 
Markov. In this case, a necessary and sufficient condition for 
( fTST l can be derived easily: If both (X, Y) and Y are Markov, 
then we have ( 1731 ) and only if the condition 



holds for any xi,X2 G An and yi,?/2 G A,n- Proving this 
claim is equivalent to showing /i(X|Y) = /i(X, Y) — h{Y). 
By using the Markov property, we obtain /i(X|Y) — /i(X, Y) + 
h{Y) = H{Xi,X2\Yi) - HiXi,X2\Yi,Y2) > 0. In the 
last inequality, we have the equality if and only if Y2 is 
independent of X^ given Yi, that is. 



pixi,x2,y2\yi) ^ p{xi,x2\yi)piy2\yi) 



(15) 



for any .x 1,2:2 G An and y 1,2/2 G Am, which is equivalent to 
the condition in the claim. 

Let us introduce the symbolic time-delayed mutual informa- 
tion rate by 



-i(X;Y) 



lim -r{X^+^-Yl^), 

L-^OO Li 



(16) 



where I* {X\;+'^ ■,Y^) = H*{X^+^) + H*{Y^^) - 
H* (X^^^ ,Y^) and discuss the relationship with the transfer 
entropy rate, the symbolic transfer entropy rate and the (time- 
delayed) mutual information rate. i^]^(X;Y) exists for any 
bivariate finite-alphabet stationary stochastic process as we 
will see below. 

Similar properties with the time-delayed mutual information 
rate hold for the symbolic time-delayed mutual information 
rate: First, we note that r(X|Y) < ^^^(X; Y) holds for any 
bivariate finite-alphabet stationary stochastic process (X, Y) 
such that /;*(X|Y) exists because the similar inequality with 
(fTTl i holds for permutation versions of corresponding quanti- 
ties. Second, the symbolic time-delayed mutual information 
rate also admits the following expression: 



(X; Y) = h* (X) + h* ( Y) -h*{X, Y) 



(17) 



Pix2,y2\xi,yi) = p{x2\xi,yi)p{y2\yi) 



(14) 



where h*{X,Y) = limL^oo H{Rf , S^)/L. Thus, if we 
introduce the symbolic mutual information rate between X 
and Y by i*{X;Y) = limL^oo j^I* i^t^Y^^)^ then we have 
i;i(X;Y) =z;i(Y;X) =r(X;Y) - t*(Y;X). 

Since the symbolic time-delayed mutual information rate 
is a sum of permutation entropy rates, it holds that for any 
bivariate finite-alphabet stationary stochastic process (X, Y), 
j+i(X;Y) — i'^i{X;Y). Hence, we obtain the following 
two claims: (i) If (X, Y) is ergodic Markov and ( 1721 ) holds 
for any Xi,X2 G An and 2/1 1 2/2 G Am, then we have 
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t*(X|Y) = t(X|Y) = z+i(X;Y) = t*+,{X;Y). (ii) 
If (X, Y) is ergodic Markov, Y is Markov and il4\l holds 
for any X\^xi € An and j/1,2/2 G ^m, then we have 
r(X|Y) = t(X|Y) - *+i(X; Y) - ^;l(X; Y). 

V. Concluding Remarks 

In this paper, we proved that the equality between the trans- 
fer entropy rate and the symboHc transfer entropy rate holds 
for any bivariate finite-alphabet stationary ergodic Markov 
process, which is the first theoretical result on permutation 
versions of the transfer entropy. We also discussed the rela- 
tionship between these quantities and the time-delayed mutual 
information rate and its permutation version. 

Next natural question is how we can weeken the condition 
for (|5]l. At present, the authors are aware that the equality (|5]l 
can be at least extended to any finite-state finite-alphabet hid- 
den Markov process whose state transition matrix is irreducible 
by almost the same discussion as in the ergodic Markov case. 
Research results along this line will be presented elsewhere. 

We hope that our proof technique based on the duality 
between </)„ and /i, which is called the duality between values 
and orderings in ll37l . opens up a systematic study on the 
relationship between the information theoretical quantities and 
their permutation versions. 
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